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THE MATHEMATICA POLICY RESEARCH VALUE-ADDED MODEL 

A. Introduction 

New Leaders for New Schools, a nonprofit organization committed to training school 
principals, heads the Effective Practices Incentive Community (EPIC), an initiative that offers 
financial awards to effective educators. New Leaders and its partner organizations have received 
from the U.S. Department of Education tens of millions of dollars to support EPIC. Through this 
initiative, New Leaders offers financial awards to educators in two urban school districts and a 
consortium of charter schools. Awards are meant to serve as both a reward for principals and 
instructional staff in schools that are effective in raising student achievement and a financial 
incentive to document effective practices at award-winning schools. New Leaders publicizes its 
findings on effective practices online. 

New Leaders contracted with Mathematica Policy Research to help design the methods for 
identifying effective schools and teachers. The approach used for each partner differs, depending on 
the priorities of the partner and the type of information available to measure school and teacher 
performance. This report presents the methods used to identify effective schools and teachers for a 
consortium of over 140 charter schools in 17 states and the District of Columbia during the second 
year of this project. Mathematica will work with New Leaders and the charter school consortium to 
revise the model in future years and incorporate additional data that become available. 

This year’s model differs from that of last year in four respects. First, we used a shrinkage 
method to help ensure that schools (and teachers) with small numbers of students in our model 
were not overrepresented at the top (and bottom) of the resulting performance measures. Second, 
we added teacher-level estimates for charter schools. Third, we estimated models using two years of 
performance data rather than just one. 1 Finally, we changed the treatment of schools that cover 
multiple grade spans. Last year, those schools had multiple chances for winning awards (one for 
each grade span). 2 This year, we altered our model so each school had only one chance of getting an 
award. More details on these changes are presented below in the discussion of this year’s methods. 

B. Method for Measuring School and Teacher Effectiveness 

Many commonly used measures of student outcomes aggregated at the classroom or school 
level, such as average test score levels or the percentage of students meeting the state proficiency 
standards, do not provide an accurate measure of the effectiveness of schools or teachers. This is 
because they are likely to be affected by students’ prior abilities and accumulated achievements, as 
well as such other factors as parents’ socioeconomic status. Better measures of effectiveness focus 



1 Last year we estimated models using only one year of performance (that is, one cohort of students in each grade 
level). This year we estimated models using both one and two years of a performance data. NLNS considered both 
models for the schools. The one- and two-year estimates for the schools were correlated at 0.94. NLNS gave out school 
awards based on the one -year models. The teacher awards were given out based on the two-year models. Using two-year 
models is particularly important for teachers because teacher estimates are generally less precise than school estimates 
due to the small sample sizes of students. Also teacher estimates are not stable over time (Sass 2008). 

2 Although schools could participate in more than one grade span category, that is, both the elementary and middle 
school categories for K through eighth grade schools, schools ultimately could receive no more than one monetary 
award, even if they were top-ranked in more than one category. 
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on how much a school or teacher contributes to the test score improvements of its students. 
Mathematica follows this approach, basing its measures on student test score growth. 

This technique, called a “value-added model” (VAM), has been used by several prominent 
researchers (Meyer 1996; Sanders 2000; McCaffrey et al. 2004; Raudenbush 2004; Hanushek et al. 
2007). VAMs aim to measure students’ achievement growth based on their own previous 
achievement levels. Many VAMs also control for such student characteristics as eligibility for free or 
reduced price lunch to account for factors that systematically affect the academic growth of different 
types of students. Thus, VAMs account for both the students’ starting point and the factors 
affecting their growth over the year. Because a value-added model accounts for initial student 
performance differences across schools, it allows schools and teachers having students with low 
baseline scores to be identified as high performers and vice versa. 

A VAM provides a better measure of effectiveness than relying on gains in the proportion of 
students achieving proficiency. Proficiency gains measure growth only for students who cross the 
proficiency cut-point, but VAMs incorporate achievement gains for all students, regardless of their 
baseline achievement levels. In addition, unlike school-wide proficiency rates, which are affected by 
changes in the composition of the student population, VAMs examine the achievement growth of 
individual students over time. (See Potamites and Chaplin [2008] for more details.) 

Ideally, VAMs estimate unbiased teacher and school effects. If students were randomly assigned 
to schools or classrooms and we had complete data on all students, our estimates would be 
unbiased. These conditions are unlikely. This means that our VAM estimates could be biased by 
unobserved factors that affect performance and are correlated with the schools or classrooms where 
a student is placed (Rothstein 2009). We control for prior test scores and observable characteristics 
in order to reduce the likelihood of such bias.’ Kane and Staiger (2008) offer some evidence 
suggesting that unobservable student characteristics based on student assignment do not play a large 
role in determining VAM scores. Using data from the Los Angeles Unified School District, they 
compared (1) the difference in value-added measures between pairs of teachers based on a typical 
situation in which principals assign students to teachers and (2) the difference in student 
achievement between the teachers the following year, in which they taught classrooms that were 
formed by principals but then randomly assigned to the teachers. Kane and Staiger found that the 
differences between teachers’ VAM scores before random assignment were a statistically significant 
and positive predictor of achievement differences when classrooms were assigned randomly. 
Because these results were gathered in schools in which the principal was willing to allow random 
assignment of classrooms to teachers, however, it is not clear if they generalize to other contexts. 

Mathematica uses a VAM to estimate the effect of each charter school and teacher on student 
performance, controlling for the prior performance of those students. Key aspects of the 
Mathematica model are outlined here; a more detailed technical description is in the appendix. 

1. Data Requirements for Participation 

In order to estimate models covering two years of performance, each charter school was asked 
to provide data on at least two cohorts of students. For each cohort we needed math and reading 



3 Models were run both with and without other observable characteristics such as free and reduced price lunch 
status, English language learner status, special education status, gender, and ethnicity. 
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test scores and student demographics for all tested students in all tested grades, except for students 
for whom baseline test scores were not available. For instance, in states that begin testing in third 
grade, elementary schools were not expected to provide past test scores for their third graders. 
Neither elementary schools nor middle schools were expected to provide baseline test scores for 
students in the lowest grade served by the school. High schools, however, were expected to provide 
middle-school baseline scores (typically from eighth grade) for their students who had attended a 
public school in the same state if they were from a state that did not test in multiple grades in high 
school. 4 All schools that provided data on current and past test scores for at least 15 students were 
included in the model. 

There were 145 charter schools in the second year EPIC consortium with the necessary data for 
inclusion in the model. 1 Those schools represent 17 states and the District of Columbia; 30 schools 
from California, 22 from the District of Columbia, 17 from Florida, 12 from Illinois, 10 from 
Massachusetts, 8 from Pennsylvania, 7 each from Colorado and New York, 6 each from Michigan 
and Texas, 5 each from Georgia and Indiana, 3 each from Louisiana and Ohio, and 1 each from 
Hawaii, Minnesota, Missouri, and New Mexico. 

As mentioned in the introduction, schools that covered multiple grade spans were eligible last 
year for at most one award but could win that award based on their performance in any of the grade 
spans served. This year, we changed our methods so that each school competed in no more than 
one grade span. Schools were classified according to the majority of students served. Schools were 
ranked in the high school category if at least 50 percent of their students in the model were in grades 
9 to 12. Middle schools were defined as schools that were not high schools but had at least 50 
percent of their students in the model in grades 7 to 12. The elementary school category included all 
other schools. Furthermore, rather than running separate models for each grade span, all schools 
were included in the same model (with indicators for the grade level); the estimated coefficients then 
were ranked within each grade span. Of the 145 schools included in the analysis, 83 were considered 
to be elementary schools, 37 to be middle schools, and 25 to be high schools. Of the 83 schools 
with elementary school grades, 47 also have students in 7th grade and above, including one school 
with students in 9th grade or above. Of the 25 high schools, only one had students in 7th or 8th 
grade as well. 

The teacher model included 908 teachers from the 114 schools with sufficient data available to 
estimate teacher effects. Sufficient data meant having at least 15 students with student- teacher links, 
end-of-year test scores and baseline test scores. Teachers were classified based on the grades of the 
courses they taught and could be eligible for awards based on their performance averaged across 
multiple spans if they taught multiple courses in different grades. There were 572 teachers included 
in the elementary grade range, 233 in the middle school grades, and 103 in the high school grades. 

2. Test Score Standardization 

Because the VAM includes test scores for multiple grades, subjects, and years, as well as scores 
from different states that administer different exams, the scores must be standardized so they fit 



4 There were only three high schools included in our models from states which tested students in multiple grades. 
These high schools were not required to obtain pre-high school test score data. 

5 One charter school submitted data and was included in our analyses but requested that they not be considered for 
an award. 
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comparable scales. Mathematica transforms the test scores by subtracting from each student’s score 
the statewide mean for that test, subject, grade, and year, and dividing by the statewide standard 
deviation for those categories. This yields a standardized score that equates each student to the 
average student in the state and that is comparable across schools within each state. 

To allow comparison of test scores across different states, Mathematica adjusts student scores 
using state average scores and standard deviations from the National Assessment of Educational 
Progress (NAEP). Details of the adjustment method are given in the appendix. 

3. The Value-Added Model 

A student’s performance on a single test is an imperfect measure of ability, so Mathematica 
employs a statistical technique known as “instrumental variable estimation” to obtain a more 
accurate measure of prior student achievement. By instmmenting for the prior math score with the 
prior reading score and vice versa, the Mathematica model incorporates information on students’ 
performance on the tests in both subjects in the prior year to measure prior student achievement. 6 

The Mathematica VAM aims to measure how much a given school or teacher has raised student 
test scores, after accounting for factors out of the school’s control. In addition to a student’s test 
score in a particular subject in the previous grade, the full VAM includes a set of variables that 
statistically control for factors that can affect the academic growth of individual students: free or 
reduced price lunch status, limited English proficiency, special education status, gender, and 
ethnicity. As mentioned previously, a version of the model was also mn that included only the 
previous test score, not the other contextual factors. There are advantages and disadvantages to 
including these other variables. School rankings are very similar under either method (the correlation 
of school rankings in the one year models was 0.988). NLNS used the model without other 
contextual factors to award schools and teachers in Year 2. 

Dosage Variables Constructed. The Mathematica model also accounts for the enrolled time 
of students who changed schools or teachers during the school year. It allocates credit to a school 
based on the fraction of time the student spent at the school, known as the school “dosage.” Thus, 
the model incorporates students who attended the charter school for only part of the year. Dosage 
works similarly in the teacher model, allocating credit to a teacher based on the fraction of the time 
the student spent in that teacher’s classroom. A student can contribute to estimates for multiple 
schools/teachers in a single year if he or she switches schools/classrooms during the year. 

Shrinkage Estimator Used. Some VAMs implicitly and inadvertently favor smaller schools 
because of greater random variation found in smaller samples of students — that is, as a result of the 
luck of the draw in any particular year rather than actual performance differences (Kane and Staiger 
2002). Similarly, teachers having fewer classes or smaller class sizes also may have a greater random 
variation due to their smaller student samples. One way of dealing with this phenomenon is to use a 
“shrinkage estimator,” a statistical technique that “shrinks” the school or teacher effects toward an 
average of a larger group of schools or teachers, with greater shrinkage for schools or teachers 
whose results were less precisely estimated — typically those with fewer students. This year, 
Mathematica implemented a shrinkage estimator for schools and teachers. (Details can be found in 
the appendix.) 



6 Charter schools were asked to report test scores in two subjects: math and reading. 
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Imputed Missing Data for Demographics Only. Data are missing for a substantial fraction 
of students. 7 Mathematica imputed missing demographic data using methods explained in the 
appendix. We also considered imputing for students whose prior test scores were missing when 
there was enough other information to make an informative prediction about what their missing 
scores were likely to have been. In the summer of 2008, we ran models to test this method and 
determined that the imputed values of the missing test scores might not be sufficiently good to 
improve our models. More details are presented in “Measuring School Effectiveness in Memphis— 
Year 2” (Potamites et al. 2009). 

Peer Effects Were Not Included in Model. The model may not control adequately for some 
variables that are not measured. One example is the extent to which students’ peers exert an 
influence on their test scores. Mathematica considered modifying the model to incorporate the 
possibility of peer effects associated with the average characteristics of the students at the school. 
We determined that the best methods currendy available for making these adjustments did not 
appear to be very robust. In addidon, they did not appear to have an effect on the relative rankings 
of schools. (More details are presented in Potamites et al. 2009.) 

C. Precision of the Rankings 

Mathematica estimates the precision of the school and teacher performance measures. One way 
to illustrate the uncertainty associated with the estimated rankings is to examine the 90 percent 
confidence interval around each ranking. This gives a school’s best and worst possible rankings 
given the margin of error associated with that school’s estimated performance measure. 

Figures 1, 2, and 3 show the confidence intervals for the school rankings in the elementary, 
middle, and high school grade ranges. These rankings are based on the full VAM, using one year of 
performance data. Schools are judged here on their performance in the 2007-08 school year. 8 The 
straight diagonal line is the ranking of each school, with the best schools having the lowest rankings. 
The jagged line above the diagonal shows the best rank in each school’s 90 percent confidence 
interval; the jagged line below the diagonal shows the worst rank for each school’s confidence 
interval. 

Since the model is used to identify the best-performing schools, the region of interest is the top 
right of the graph, documenting the precision of the top-ranked schools. For example, Figure 1 
shows that, given the uncertainty in our estimates of school rankings, with 90 percent confidence, 
the top 10 percent of elementary schools (n=8) are all ranked in at least the top 22 percent (18th out 
of 83 schools). The results shown in Figure 2 are only slightly less precise for middle schools, as the 
top 10 percent (n=4) rank in the top 27 percent at worst (10th out of 37). The results for high 
schools shown in Figure 3 are somewhat less precise at the top end of the distribution than those for 
the middle schools: the top 10 percent of high schools (n=3) rank only in the top 32 percent (8th of 
25). 



7 Ethnicity, gender, and special education status were missing for less than 1 percent of the final one -year analysis 
sample. Free or reduced price lunch status was missing for 6 percent and limited English proficiency status was missing 
for 12 percent. 

8 Except for the 11 schools from Indiana and Michigan, where students are tested in the fall of the school year, 
these schools are judged by their performance in 2006-07. 
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Figure 1. 90% Confidence Intervals for One-Year Full VAM Estimates, Elementary 




Source: Data collected and analyzed by Mathematica Policy Research. 

Note: The upper and lower lines are the upper and lower bounds of a 90 percent 

confidence interval around the school ranking, which is given as the middle line. 
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Figure 2. 90% Confidence Intervals for One-Year Full VAM Estimates, Middle Schools 




School Rank 

Source: Data collected and analyzed by Mathematica Policy Research. 

Note: The upper and lower lines are the upper and lower bounds of a 90 percent 

confidence interval around the school ranking, which is given as the middle line. 
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Figure 3. 90% Confidence Intervals for One-Year Full VAM Estimates, High Schools 




Source: Data collected and analyzed by Mathematica Policy Research. 

Note: The upper and lower lines are the upper and lower bounds of a 90 percent 

confidence interval around the school ranking, which is given as the middle line. 

One way to improve the precision of the estimates is to use additional years of data. 
Mathematica estimated school VAM models based on two years of performance and found that the 
estimated standard errors decreased from an average of .081 for the one-year full model after 
shrinking to .051 for the two-year full model after shrinking. 9 Similarly, estimated standard errors 
decreased in the teacher model, from an average of . 1 60 for the one-year full model after shrinking 
to .132 for the two-year full model following shrinking. Further information on the precision of the 
estimates is contained in the appendix. 

Figures 4, 5, and 6 show the 90 percent confidence intervals for the school rankings where two 
years of performance data were available. For schools with available data (116 out of 145), the 
estimates are based on their performance in both the 2007-08 and the 2006-07 school years, except 
that schools in Michigan and Indiana were judged based on 2005-06 and 2006-07. The 29 schools 
with performance results from 2007-08 only are also included in the two-year results. As expected. 



9 For comparison, the average standard error in the one -year full model before shrinking was .088. So the shrinkage 
estimator reduced standard errors by 8 percent, while adding another year of data further reduced the mean standard 
error by 37 percent. See Table A.l in the appendix. 
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the precision increases. With 90 percent confidence, the top 10 percent of elementary schools all are 
ranked in at least in the top 12 percent (10 out of 83), the top 10 percent of middle schools all are 
still within the top 20 percent (7 out of 35), and the top 10 percent of high schools are still in the top 
24 percent (6 out of 25). The top schools in each category are mosdy the same in both the one- and 
two-year models. There is one switch among the elementary schools and one among the middle 
schools. 

Figure 4. 90% Confidence Intervals for Two-Year Full VAM Estimates, Elementary 




Source: Data collected and analyzed by Mathematica Policy Research. 

Note: The upper and lower lines are the upper and lower bounds of a 90 percent 

confidence interval around the school ranking, which is given as the middle line. 
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Figure 5. 90% Confidence Intervals for Two-Year Full VAM Estimates, Middle Schools 




School Rank 



Source: Data collected and analyzed by Mathematica Policy Research. 

Note: The upper and lower lines are the upper and lower bounds of a 90 percent 

confidence interval around the school ranking, which is given as the middle line. 
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Figure 6. 90% Confidence Intervals for Two-Year Full VAM Estimates, High Schools 




Source: Data collected and analyzed by Mathematica Policy Research. 

Note: The upper and lower lines are the upper and lower bounds of a 90 percent 

confidence interval around the school ranking, which is given as the middle line. 



For the teacher model, the most meaningful comparisons are between teachers within the same 
school. Because the rankings measure the net influence of the classroom plus school characteristics 
(such as the effect of the principal or school culture on student achievement), it is not possible to 
disentangle the effect of the teacher from that of the school. To illustrate the comparison of teachers 
within a school, Figure 7 shows the teacher scores, along with the upper and lower bounds of a 90 
percent confidence interval, for an award-winning elementary school, middle school, and high 
school. 

As Figure 7 shows, the lower bound of the top-ranked teacher in each school is greater than 
zero but lower than the estimated score of the second-best teacher. For each grade level, the top- 
ranked teacher clearly is scoring higher than the lowest-ranked teacher, but teachers ranked near 
each other generally are not statistically distinguishable. 
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Figure 7 



Source: 

Note: 



. 90% Confidence Intervals for Two-Year Full VAM Teacher Estimates, for Sample 
Award-Winning Schools 
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Data collected by Mathematica Policy Research. 

The upper and lower lines are the upper and lower bounds of a 90 percent 
confidence interval around the teacher VAM estimate, which is given as the middle 
point. 
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